[New features] Advanced Query Accelerator (AQUA) for Amazon Redshift is now generally available!
(The original Japanese article is available here.)
Amazon Redshift using RA3 instances, called Advanced Query Accelerator (AQUA) for Amazon Redshift was initially announced in AWS re:Invent 2019 and is now finally generally available. I am going to try it out and show you its process in this article.
Mr. Ozono from AWS Japan (AWSJ) confirms it in the tweet too, so it must be real.
お待たせしました。re:Invent 2019 での発表で反響あった Redshift AQUA が GA になりました。RA3.16XL および 4XL にて、追加コストなしに AQUA の機能を使えます。東京リージョンも対応です!
AWS announces general availability of AQUA for Amazon Redshift https://t.co/kfBHizUnEN
— J.Ozono (@jostandard) April 14, 2021
AQUA is available with the latest node family, RA3 (ra3.4xl or ra3.16xl) and with release version 1.0.24421 or later in the following AWS Regions:
- US East (N. Virginia) Region (us-east-1)
- US East (Ohio) Region (us-east-2)
- US West (Oregon) Region (us-west-2)
- Asia Pacific (Tokyo) Region (ap-northeast-1)
- Europe (Ireland) Region (eu-west-1)
So glad that it’s already available in the Tokyo Region as well!
What is AQUA (Advanced Query Accelerator)?
AQUA serves as a powerful query accelerator that is a hardware provided between the RA3 nodes (ra3.4xl or ra3.16xl) and S3 managed storage.
Here is the introduction by AWS Blog.
The storage system uses multiple cues, including data block temperature, data blockage, and workload patterns, to manage the cache for high performance
...
Building on the caches that I told you about earlier, and taking advantage of the AWS Nitro System and custom FPGA-based acceleration, AQUA pushes the computation needed to handle reduction and aggregation queries closer to the data. This reduces network traffic, offloads work from the CPUs in the RA3 nodes, and allows AQUA to improve the performance of those queries by up to 10x, at no extra cost and without any code changes. AQUA also makes use of a fast, high-bandwidth connection to Amazon Simple Storage Service (S3).
Creating a cluster using AQUA from a snapshot
This time, I tried creating a cluster which uses AQUA by restoring from a snapshot. You can choose either ra3.4xlarge or ra3.16xlarge as a node type. If you already have a cluster with either of these node types, the AQUA is configured as ‘Automatic’ by default. To start using AQUA, choose [Actions]-[Configure AQUA] and change the configuration from Automatic
to Turn On
in the following dialogue.
I created a cluster with the default configuration, Automatic. You have the following choices when you configure AQUA:
- Automatic (default)
- Redshift determines whether to use AQUA.
- As of today, Working with AQUA (Advanced Query Accelerator) states “Currently, AQUA isn't activated with this option, but this behavior is subject to change.” At this moment, this option means the same as
Turn Off
; AQUA won’t be activated.
- Turn On
- You choose to always use AQUA. AQUA can only be activated in certain AWS Regions and for ra3.4xlarge and ra3.16xlarge node types.
- Turn Off
- You choose not to use AQUA.
It turned Available after about 5 minutes. You can see the cluster is launched with AQUA of ‘Automatic’.
Creating test data
AQUA seems to excel at the acceleration of LIKE and SIMILAR TO in particular, so I prepared approximately 300 million of data.
dev=> create table lineitem ( dev(> l_orderkey bigint not null, dev(> l_partkey bigint, dev(> l_suppkey bigint, dev(> l_linenumber integer not null, dev(> l_quantity decimal(18,4), dev(> l_extendedprice decimal(18,4), dev(> l_discount decimal(18,4), dev(> l_tax decimal(18,4), dev(> l_returnflag varchar(1), dev(> l_linestatus varchar(1), dev(> l_shipdate date, dev(> l_commitdate date, dev(> l_receiptdate date, dev(> l_shipinstruct varchar(25), dev(> l_shipmode varchar(10), dev(> l_comment varchar(44)) dev-> distkey (l_orderkey) dev-> sortkey (l_receiptdate); CREATE TABLE dev=> copy lineitem from 's3://cm-bucket/redshift-immersionday-labs/data/lineitem-part/' dev-> iam_role 'arn:aws:iam::123456789012:role/AmazonRedshiftRole' dev-> region 'ap-northeast-1' gzip delimiter '|' compupdate preset; INFO: Load into table 'lineitem' completed, 303008217 record(s) loaded successfully. COPY dev=> select * from lineitem limit 1; -[ RECORD 1 ]---+---------------------------------------- l_orderkey | 7428384 l_partkey | 9121341 l_suppkey | 621360 l_linenumber | 4 l_quantity | 23.0000 l_extendedprice | 31323.4700 l_discount | 0.0900 l_tax | 0.0500 l_returnflag | R l_linestatus | F l_shipdate | 1992-01-02 l_commitdate | 1992-03-22 l_receiptdate | 1992-01-03 l_shipinstruct | DELIVER IN PERSON l_shipmode | FOB l_comment | haggle carefully about the furiously ir
Performance test of AQUA
I reviewed the behavior by explicitly changing Turn On/Off. To update the AQUA configuration, you can click [Actions]-[Configure AQUA].
Note:
After changing Turn ON/Off in the dialogue and clicking on [Save changes], the cluster is immediately rebooted to apply the change.
Testing queries
I executed the following SIMILAR TO
and LIKE
sample queries and measured each response time.
SIMILAR TO sample query
-- explain select sum(l_orderkey), count(*) from lineitem where l_comment similar to 'slyly %' or l_comment similar to 'plant %' or l_comment similar to 'fina %' or l_comment similar to 'quick %' or l_comment similar to 'slyly %' or l_comment similar to 'quickly %' or l_comment similar to ' %about%' or l_comment similar to ' final%' or l_comment similar to ' %final%' or l_comment similar to ' breach%' or l_comment similar to ' egular%' or l_comment similar to ' %closely%' or l_comment similar to ' closely%' or l_comment similar to ' %idea%' or l_comment similar to ' idea%' ;
LIKE sample query
-- explain select sum(l_orderkey), count(*) from lineitem where l_comment like 'slyly %' or l_comment like 'plant %' or l_comment like 'fina %' or l_comment like 'quick %' or l_comment like 'slyly %' or l_comment like 'quickly %' or l_comment like ' %about%' or l_comment like ' final%' or l_comment like ' %final%' or l_comment like ' breach%' or l_comment like ' egular%' or l_comment like ' %closely%' or l_comment like ' closely%' or l_comment like ' %idea%' or l_comment like ' idea%' ;
I searched strings contained in ‘l_comment’ and aggregated the respective records. I felt really guilty of making such queries with SIMILAR TO
and LIKE
just by connecting with or
, but I believed in AQUA’s capacity.
In the following results, the result cache was disabled to measure the processing time.
set enable_result_cache_for_session to off;
Comparison of SIMILAR TO
AQUA deactivated (Turn Off)
dev=> select sum(l_orderkey), count(*) dev-> from lineitem dev-> where dev-> l_comment similar to 'slyly %' or dev-> l_comment similar to 'plant %' or dev-> l_comment similar to 'fina %' or dev-> l_comment similar to 'quick %' or dev-> l_comment similar to 'slyly %' or dev-> l_comment similar to 'quickly %' or dev-> l_comment similar to ' %about%' or dev-> l_comment similar to ' final%' or dev-> l_comment similar to ' %final%' or dev-> l_comment similar to ' breach%' or dev-> l_comment similar to ' egular%' or dev-> l_comment similar to ' %closely%' or dev-> l_comment similar to ' closely%' or dev-> l_comment similar to ' %idea%' or dev-> l_comment similar to ' idea%' ; sum | count ------------------+--------- 1440371216714447 | 9496106 (1 row) Time: 215896.819 ms select sum(l_orderkey), count(*) from lineitem where l_comment similar to 'slyly %' or l_comment similar to 'plant %' or l_comment similar to 'fina %' or l_comment similar to 'quick %' or l_comment similar to 'slyly %' or l_comment similar to 'quickly %' or l_comment similar to ' %about%' or l_comment similar to ' final%' or l_comment similar to ' %final%' or l_comment similar to ' breach%' or l_comment similar to ' egular%' or l_comment similar to ' %closely%' or l_comment similar to ' closely%' or l_comment similar to ' %idea%' or l_comment similar to ' idea%' ; sum | count ------------------+--------- 1440371216714447 | 9496106 (1 row) Time: 211313.374 ms
Below is the execution plan.
dev=> explain select sum(l_orderkey), count(*) from lineitem where l_comment similar to 'slyly %' or l_comment similar to 'plant %' or l_comment similar to 'fina %' or l_comment similar to 'quick %' or l_comment similar to 'slyly %' or l_comment similar to 'quickly %' or l_comment similar to ' %about%' or l_comment similar to ' final%' or l_comment similar to ' %final%' or l_comment similar to ' breach%' or l_comment similar to ' egular%' or l_comment similar to ' %closely%' or l_comment similar to ' closely%' or l_comment similar to ' %idea%' or l_comment similar to ' idea%' ; QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- XN Aggregate (cost=13830214.62..13830214.62 rows=1 width=8) -> XN Seq Scan on lineitem (cost=0.00..13635370.08 rows=38968908 width=8) Filter: (((l_comment)::text ~ '^( .*idea.*)$'::text) OR ((l_comment)::text ~ '^( idea.*)$'::text) OR ((l_comment)::text ~ '^(fina .*)$'::text) OR ((l_comment)::text ~ '^( .*about.*)$'::text) OR ((l_comment)::text ~ '^( .*final.*)$'::text) OR ((l_comment)::text ~ '^( final.*)$'::text) OR ((l_comment)::text ~ '^(plant .*)$'::text) OR ((l_comment)::text ~ '^(quick .*)$'::text) OR ((l_comment)::text ~ '^(slyly .*)$'::text) OR ((l_comment)::text ~ '^( breach.*)$'::text) OR ((l_comment)::text ~ '^( egular.*)$'::text) OR ((l_comment)::text ~ '^( .*closely.*)$'::text) OR ((l_comment)::text ~ '^( closely.*)$'::text) OR ((l_comment)::text ~ '^(quickly .*)$'::text)) (3 rows) Time: 8.506 ms
AQUA activated (Turn On)
dev=> select sum(l_orderkey), count(*) dev-> from lineitem dev-> where dev-> l_comment similar to 'slyly %' or dev-> l_comment similar to 'plant %' or dev-> l_comment similar to 'fina %' or dev-> l_comment similar to 'quick %' or dev-> l_comment similar to 'slyly %' or dev-> l_comment similar to 'quickly %' or dev-> l_comment similar to ' %about%' or dev-> l_comment similar to ' final%' or dev-> l_comment similar to ' %final%' or dev-> l_comment similar to ' breach%' or dev-> l_comment similar to ' egular%' or dev-> l_comment similar to ' %closely%' or dev-> l_comment similar to ' closely%' or dev-> l_comment similar to ' %idea%' or dev-> l_comment similar to ' idea%' ; sum | count ------------------+--------- 1440371216714447 | 9496106 (1 row) Time: 29191.625 ms dev=> select sum(l_orderkey), count(*) dev-> from lineitem dev-> where dev-> l_comment similar to 'slyly %' or dev-> l_comment similar to 'plant %' or dev-> l_comment similar to 'fina %' or dev-> l_comment similar to 'quick %' or dev-> l_comment similar to 'slyly %' or dev-> l_comment similar to 'quickly %' or dev-> l_comment similar to ' %about%' or dev-> l_comment similar to ' final%' or dev-> l_comment similar to ' %final%' or dev-> l_comment similar to ' breach%' or dev-> l_comment similar to ' egular%' or dev-> l_comment similar to ' %closely%' or dev-> l_comment similar to ' closely%' or dev-> l_comment similar to ' %idea%' or dev-> l_comment similar to ' idea%' ; sum | count ------------------+--------- 1440371216714447 | 9496106 (1 row) Time: 7512.982 ms
Below is the execution plan.
dev=> explain select sum(l_orderkey), count(*) from lineitem where l_comment similar to 'slyly %' or l_comment similar to 'plant %' or l_comment similar to 'fina %' or l_comment similar to 'quick %' or l_comment similar to 'slyly %' or l_comment similar to 'quickly %' or l_comment similar to ' %about%' or l_comment similar to ' final%' or l_comment similar to ' %final%' or l_comment similar to ' breach%' or l_comment similar to ' egular%' or l_comment similar to ' %closely%' or l_comment similar to ' closely%' or l_comment similar to ' %idea%' or l_comment similar to ' idea%' ; QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- XN Aggregate (cost=13830214.62..13830214.62 rows=1 width=8) -> XN Seq Scan on lineitem (cost=0.00..13635370.08 rows=38968908 width=8) Filter: (((l_comment)::text ~ '^( .*idea.*)$'::text) OR ((l_comment)::text ~ '^( idea.*)$'::text) OR ((l_comment)::text ~ '^(fina .*)$'::text) OR ((l_comment)::text ~ '^( .*about.*)$'::text) OR ((l_comment)::text ~ '^( .*final.*)$'::text) OR ((l_comment)::text ~ '^( final.*)$'::text) OR ((l_comment)::text ~ '^(plant .*)$'::text) OR ((l_comment)::text ~ '^(quick .*)$'::text) OR ((l_comment)::text ~ '^(slyly .*)$'::text) OR ((l_comment)::text ~ '^( breach.*)$'::text) OR ((l_comment)::text ~ '^( egular.*)$'::text) OR ((l_comment)::text ~ '^( .*closely.*)$'::text) OR ((l_comment)::text ~ '^( closely.*)$'::text) OR ((l_comment)::text ~ '^(quickly .*)$'::text)) (3 rows) Time: 8.683 ms
Results
SIMILAR TO
queries with AQUA activated significantly improved the performance: 7.4 times in the first time, 28.1 times in the second and later times. The query plans were the same regardless of AQUA, in which the conditions were converted to regular expressions. (The unit of the following table is “seconds”.)
AQUA (Off) | AQUA (On) | Improvement by AQUA | |
---|---|---|---|
Round 1 | 215.896 | 29.191 | 7.4 times faster |
Round 2 | 211.313 | 7.512 | 28.1 times faster |
First run ratio | 1 | 3.9 | - |
Comparison of LIKE
AQUA deactivated (Turn Off)
dev=> select sum(l_orderkey), count(*) dev-> from lineitem dev-> where dev-> l_comment like 'slyly %' or dev-> l_comment like 'plant %' or dev-> l_comment like 'fina %' or dev-> l_comment like 'quick %' or dev-> l_comment like 'slyly %' or dev-> l_comment like 'quickly %' or dev-> l_comment like ' %about%' or dev-> l_comment like ' final%' or dev-> l_comment like ' %final%' or dev-> l_comment like ' breach%' or dev-> l_comment like ' egular%' or dev-> l_comment like ' %closely%' or dev-> l_comment like ' closely%' or dev-> l_comment like ' %idea%' or dev-> l_comment like ' idea%' ; sum | count ------------------+--------- 1440371216714447 | 9496106 (1 row) Time: 10276.394 ms dev=> dev=> dev=> select sum(l_orderkey), count(*) from lineitem where l_comment like 'slyly %' or l_comment like 'plant %' or l_comment like 'fina %' or l_comment like 'quick %' or l_comment like 'slyly %' or l_comment like 'quickly %' or l_comment like ' %about%' or l_comment like ' final%' or l_comment like ' %final%' or l_comment like ' breach%' or l_comment like ' egular%' or l_comment like ' %closely%' or l_comment like ' closely%' or l_comment like ' %idea%' or l_comment like ' idea%' ; sum | count ------------------+--------- 1440371216714447 | 9496106 (1 row) Time: 6921.963 ms
Below is the execution plan.
dev=> explain select sum(l_orderkey), count(*) from lineitem where l_comment like 'slyly %' or l_comment like 'plant %' or l_comment like 'fina %' or l_comment like 'quick %' or l_comment like 'slyly %' or l_comment like 'quickly %' or l_comment like ' %about%' or l_comment like ' final%' or l_comment like ' %final%' or l_comment like ' breach%' or l_comment like ' egular%' or l_comment like ' %closely%' or l_comment like ' closely%' or l_comment like ' %idea%' or l_comment like ' idea%' ; QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- XN Aggregate (cost=13688958.11..13688958.11 rows=1 width=8) -> XN Seq Scan on lineitem (cost=0.00..13635370.08 rows=10717605 width=8) Filter: (((l_comment)::text ~~ ' %idea%'::text) OR ((l_comment)::text ~~ ' %about%'::text) OR ((l_comment)::text ~~ ' %final%'::text) OR ((l_comment)::text ~~ ' %closely%'::text) OR ((l_comment)::text ~~ ' breach%'::text) OR ((l_comment)::text ~~ ' closely%'::text) OR ((l_comment)::text ~~ ' egular%'::text) OR ((l_comment)::text ~~ ' final%'::text) OR ((l_comment)::text ~~ ' idea%'::text) OR ((l_comment)::text ~~ 'fina %'::text) OR ((l_comment)::text ~~ 'plant %'::text) OR ((l_comment)::text ~~ 'quick %'::text) OR ((l_comment)::text ~~ 'quickly %'::text) OR ((l_comment)::text ~~ 'slyly %'::text)) (3 rows) Time: 7.985 ms
AQUA activated (Turn On)
dev=> select sum(l_orderkey), count(*) dev-> from lineitem dev-> where dev-> l_comment like 'slyly %' or dev-> l_comment like 'plant %' or dev-> l_comment like 'fina %' or dev-> l_comment like 'quick %' or dev-> l_comment like 'slyly %' or dev-> l_comment like 'quickly %' or dev-> l_comment like ' %about%' or dev-> l_comment like ' final%' or dev-> l_comment like ' %final%' or dev-> l_comment like ' breach%' or dev-> l_comment like ' egular%' or dev-> l_comment like ' %closely%' or dev-> l_comment like ' closely%' or dev-> l_comment like ' %idea%' or dev-> l_comment like ' idea%' ; sum | count ------------------+--------- 1440371216714447 | 9496106 (1 row) Time: 11116.387 ms dev=> dev=> select sum(l_orderkey), count(*) from lineitem where l_comment like 'slyly %' or l_comment like 'plant %' or l_comment like 'fina %' or l_comment like 'quick %' or l_comment like 'slyly %' or l_comment like 'quickly %' or l_comment like ' %about%' or l_comment like ' final%' or l_comment like ' %final%' or l_comment like ' breach%' or l_comment like ' egular%' or l_comment like ' %closely%' or l_comment like ' closely%' or l_comment like ' %idea%' or l_comment like ' idea%' ; sum | count ------------------+--------- 1440371216714447 | 9496106 (1 row) Time: 7526.141 ms
Below is the execution plan.
dev=> explain select sum(l_orderkey), count(*) from lineitem where l_comment like 'slyly %' or l_comment like 'plant %' or l_comment like 'fina %' or l_comment like 'quick %' or l_comment like 'slyly %' or l_comment like 'quickly %' or l_comment like ' %about%' or l_comment like ' final%' or l_comment like ' %final%' or l_comment like ' breach%' or l_comment like ' egular%' or l_comment like ' %closely%' or l_comment like ' closely%' or l_comment like ' %idea%' or l_comment like ' idea%' ; QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- XN Aggregate (cost=13688958.11..13688958.11 rows=1 width=8) -> XN Seq Scan on lineitem (cost=0.00..13635370.08 rows=10717605 width=8) Filter: (((l_comment)::text ~~ ' %idea%'::text) OR ((l_comment)::text ~~ ' %about%'::text) OR ((l_comment)::text ~~ ' %final%'::text) OR ((l_comment)::text ~~ ' %closely%'::text) OR ((l_comment)::text ~~ ' breach%'::text) OR ((l_comment)::text ~~ ' closely%'::text) OR ((l_comment)::text ~~ ' egular%'::text) OR ((l_comment)::text ~~ ' final%'::text) OR ((l_comment)::text ~~ ' idea%'::text) OR ((l_comment)::text ~~ 'fina %'::text) OR ((l_comment)::text ~~ 'plant %'::text) OR ((l_comment)::text ~~ 'quick %'::text) OR ((l_comment)::text ~~ 'quickly %'::text) OR ((l_comment)::text ~~ 'slyly %'::text)) (3 rows) Time: 8.096 ms
Results
LIKE
queries with AQUA activated resulted in slightly lower performance: 0.9 times both in the first time and after the second time. (The unit of the following table is “seconds”.)
AQUA (Off) | AQUA (On) | Improvement by AQUA | |
---|---|---|---|
Round 1 | 10.276 | 11.116 | 0.9 times (10% slower) |
Round 2 | 6.921 | 7.557 | 0.9 times (10% slower) |
First run ratio | 1.5 | 1.5 | - |
Discussion
By using AQUA, queries filtered with SIMILAR TO
showed 7-28 times of performance improvement, whereas queries with LIKE
were executed a bit slower. AQUA might cause a certain degree of overhead.
In this test, there was a gap in processing time between the first attempt and later attempts, even though I had disabled the result cache. This is possibly due to some causes:
The first run of a query without AQUA includes time to compile a record and to load data from managed S3 to the local storage. Both of these are cached after the second runs. On the other hand, the first run of a query with AQUA includes time to compile a record and presumably to load data from managed S3 to AQUA. This data loading to AQUA might be cached too, but currently, the conditions or forms where AQUA caches are not officially indicated, and we cannot expect how much it caches.
Pricing of AQUA
It is completely free of charge!!!
Conclusion
Through these tests, I could confirm 7-28 times of performance enhancement in the queries filtered with SIMILAR TO
by activating AQUA.
I conducted the test by explicitly turning AQUA on/off, also with other different data. Depending on the queries and workloads, the query execution time got faster when AQUA was not used. I hope the default setting, ‘Automatic’ will be in force soon so that Redshift can automatically decide if AQUA is necessary or not.
Since not all workloads are generally suitable for AQUA, it is recommended to determine whether to activate AQUA based on your use case. At least, the default AQUA setting of ‘Automatic’ is already configured in the clusters which use the supported node types. If the features are further developed such as automating AQUA activation, we could be benefitting from the work of AQUA without even realizing it. Let us be patient and look forward to seeing how Redshift evolves and AQUA advances.
If there is a way to invalidate compile cache and cache of data which is loaded from managed S3 to local storage, a more fine-grained test will be possible. This would then help us consider the issues, challenges, and workarounds that we may have. It is not only about Redshift, but cloud DWHs are getting more and more complex and making it difficult for humans to grasp their behavior. I wish the time comes when we don’t even need to understand their operations but they can keep running always in good shape.
(This article was originally written in Japanese and has been translated by Ito and Norita.)